The use of trigram analysis for spelling error detection
نویسندگان
چکیده
Work performed under the SPElling Error Detection Correction Project (SPEEDCOP) supported by National Science Foundation (NSF) at Chemical Abstracts Service (CAS) to devise effective automatic methods of detecting and correcting misspellings in scholarly and scientific text is described. The investigation was applied to 50,000 wordlmisspelling pairs collected from six datasets (Chemical Industry Notes (GIN), Biological Abstracts (BA). Chemical Abstracts (CA), America1 Chem~cai Society primary journal keyboarding (ACS), Information Science Abstracts (ISA), and Distributed On-Line Editing (DOLE) (a CAS internal dataset especially suited to spelling error studies). The purpose of this study was to determine the utility of trigram analysis in the automatic detection and/or correction of misspellings. Computer programs were developed to collect data on trigram distribution in each dataset and to explore the potential of trigram analysis for detecting spelling errors, verifying correctly-spelled words, locating the error site within a misspelling, and distinguishing between the basic kinds of spelling errors. The results of the trigram analysis were largely independent of the dataset to which it was applied but trigram compositions varied with the dataset. The trigram analysis technique developed determined the error site within a misspelling accurately. but did not distinguish effectively between different error types or between valid words and misspellings. However. methods for increasing its accuracy are suggested.
منابع مشابه
Detection is the central problem in real-word spelling correction
Real-word spelling correction differs from non-word spelling correction in its aims and its challenges. Here we show that the central problem in real-word spelling correction is detection. Methods from non-word spelling correction, which focus instead on selection among candidate corrections, do not address detection adequately, because detection is either assumed in advance or heavily constrai...
متن کاملReal-Word Spelling Correction with Trigrams: A Reconsideration of the Mays, Damerau, and Mercer Model
The trigram-based noisy-channel model of real-word spelling-error correction that was presented by Mays, Damerau, and Mercer in 1991 has never been adequately evaluated or compared with other methods. We analyze the advantages and limitations of the method, and present a new evaluation that enables a meaningful comparison with the WordNet-based method of Hirst and Budanitsky. The trigram method...
متن کاملDesign and implementation of Persian spelling detection and correction system based on Semantic
Persian Language has a special feature (grapheme, homophone, and multi-shape clinging characters) in electronic devices. Furthermore, design and implementation of NLP tools for Persian are more challenging than other languages (e.g. English or German). Spelling tools are used widely for editing user texts like emails and text in editors. Also developing Persian tools will provide Persian progr...
متن کاملCooperative Error Handling and Shallow Processing
This paper is concerned with the detection and correction of sub-sentential English text errors. Previous spelling programs, unless restricted to a very small set of words, have operated as post-processors. And to date, grammar checkers and other programs which deal with ill-formed input usually step directly from spelling considerations to a full-scale parse, assuming a complete sentence. Work...
متن کاملA simple real-word error detection and correction using local word bigram and trigram
Spelling error is broadly classified in two categories namely non word error and real word error. In this paper a localized real word error detection and correction method is proposed where the scores of bigrams generated by immediate left and right neighbour of the candidate word and the trigram of these three words are combined. A single character position error model is assumed so that if a ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Inf. Process. Manage.
دوره 17 شماره
صفحات -
تاریخ انتشار 1981